Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

نویسندگان

چکیده

This paper studies the asymptotic behavior of constant step Stochastic Gradient Descent for minimization an unknown function, defined as expectation a non convex, smooth, locally Lipschitz random function. As gradient may not exist, it is replaced by certain operator: reasonable choice to use element Clarke subdifferential function; another output celebrated backpropagation algorithm, which popular amongst practioners, and whose properties have recently been studied Bolte Pauwels. Since chosen operator in general mean has assumed literature that oracle function available. first result, shown this such needed almost all initialization points algorithm. Next, small size regime, interpolated trajectory algorithm converges probability (in compact convergence sense) towards set solutions particular differential inclusion: subgradient flow. Finally, viewing iterates Markov chain transition kernel indexed size, invariant distribution converge weakly inclusion tends zero. These results show when small, with large probability, eventually lie neighborhood critical

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergence diagnostics for stochastic gradient descent with constant step size

Iterative procedures in stochastic optimization are typically comprised of a transient phase and a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in a convergence region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transiti...

متن کامل

Convergence Rate of Sign Stochastic Gradient Descent for Non-convex Functions

The sign stochastic gradient descent method (signSGD) utilises only the sign of the stochastic gradient in its updates. For deep networks, this one-bit quantisation has surprisingly little impact on convergence speed or generalisation performance compared to SGD. Since signSGD is effectively compressing the gradients, it is very relevant for distributed optimisation where gradients need to be a...

متن کامل

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

The Burer-Monteiro [1] decomposition (X = Y Y T ) with stochastic gradient descent is commonly employed to speed up and scale up matrix problems including matrix completion, subspace tracking, and SDP relaxation. Although it is widely used in practice, there exist no known global convergence results for this method. In this paper, we prove that, under broad sampling conditions, a first-order ra...

متن کامل

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required nontrivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector machines. In this paper, we investigate t...

متن کامل

Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates

With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T ) for strongly convex functions, instead of O(κ ln(T )/T ). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T ).

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Set-valued and Variational Analysis

سال: 2022

ISSN: ['1877-0541', '1877-0533']

DOI: https://doi.org/10.1007/s11228-022-00638-z